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Abstract. Steepest descent drives both theory and practice of nonsmooth optimization. We 
study slight relaxations of two influential notions of steepest descent curves — curves of maximal 
slope and solutions to evolution equations. In particular, we provide a simple proof showing that 
lower-semicontinuous functions that are locally Lipschitz continuous on their domains — functions 
playing a central role in nonsmooth optimization — admit Lipschitz continuous steepest descent 
curves in both senses. 
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1. Introduction. The intuitive notion of steepest descent plays a central role in 
theory and practice. So what are steepest descent curves in an entirely nonsmooth 
setting? In addressing this question, it is worthwhile to consider the classical setting: 
steepest descent curves for smooth functions / : R" — > R are simply smooth curves 
x : [0, T) — > R" satisfying the differential equation 

x(t) = -V/(a;(t)), for each t G [0,T). (1.1) 

Evidently this gradient dynamical system is equivalent to the pair of scalar equations 

||i(t)|| = \\Vf(x(t))\\, d ^£^(t) = -(\\Vf(x(t))\\) 2 , for each t G [0,T). (1-2) 

A number of authors have taken up the task of rigorously modelling steepest descent 
in a nonsmooth setting, yielding two influential ideas based on generalizations of 



(1.1) and (1.2 1. Namely, generalizing the former we may instead consider absolutely 



continuous curves satisfying the evolution equation 

x(t) £ -df(x{t)), for a.e. t G [0,T), 

where df is some generalized derivative (or subdifferential). See for example [8] [22] - 
Alternatively, interpreting the quantity ||V/(x(f))|| in ( |1.2[ ) as the "slope" of / at 
x(t) leads to the idea of curves of maximal slope. For more details see [221 1121 [TJ 
[T31 U31 [22] . These two approaches have yielded major impact on optimization, PDEs, 
probability theory, and optimal transport. For a recent expository monograph, see [2]. 
In the current work, we study evolution equations and curves of near-maximal slope 
— a slight relaxation of curves of maximal slope that is well-suited for nonsmooth 
optimization. 
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The question concerning existance of such curves is at the core of the subject. 
Roughly speaking, there are two strategies for constructing steepest descent curves 
for a function / on R™. The first one revolves around minimizing / on an increasing 
sequence of balls around a point Xk until the radius hits a certain threshold, at which 
point one moves the center to the next iterate and repeats the procedure. Passing to 
the limit as the thresholds tend to zero, under suitable conditions, yields a curve of 
maximal slope. The second approach is based on De Georgi's generalized movements 
[12"] . Namely, one builds a piecewise constant curve by declaring the next iterate to be 
a minimizer of the function / plus a scaling of the squared distance from the previous 
iterate [5J Chapter 2]. The analysis, in both cases, is highly nontrivial and moreover 
does not give an intuitive meaning to the parametrization of the curve. 

In the current work, we propose an alternate transparent strategy for constructing 
steepest descent curves. Our central object of study is the so-called proximal descent 
curve of a function /. It is obtained by discrctizing the range of / and then building 
a piecewise linear curve by projecting iterates onto successive lower-level sets. Un- 
der reasonable condition, taking the limit as the mesh of the partition tends to zero, 
the resulting curve is a Lipschitz continuous steepest descent curve in both senses! 
Furthermore, the parametrization of the curve is entirely intuitive: the values of the 
function parametrize the curve. In particular, our results are strong enough to deduce 
existance of steepest descent curves for lower-semicontinuous functions that are Lip- 
schitz continuous on their domains — functions of utmost importance in nonsmooth 
optimization. From a technical viewpoint, using the function values to parametrize 
the curve allows for the deep theory of metric regularity to enter the picture [T71 [35] , 
thereby yielding a simple and elegant existance proof. 

The question concerning when solutions of evolution equations and curves of 
maximal slope are one and the same has been studied as well. However a major 
standing assumption that has so far been needed to establish positive answers in this 
direction is that the slope of the function / is itself a lower-semicontinuous function 
[2j 122] — an assumption that many common functions of nonsmooth optimization 
(e.g. f(x) = min{x, 0}) do not satisfy. In the current work, we study this question 
in absence of such a continuity condition. As a result, semi-algebraic functions - 
those functions whose epigraph can be written as a finite union of sets, each defined 
by finitely many polynomial inequalities [11] [29] — come to the fore. For semi- 
algebraic functions that are locally Lipschitz continuous on their domains, solutions 
to evolution equations are one and the same as curves of near-maximal slope. Going a 
step further, using an argument based on the Kurdyka-Lojasiewicz inequality, in the 
spirit of [Till HU HI [SJ , we show that bounded curves of near-maximal slope for semi- 
algebraic functions necessarily have finite length. Consequently, such curves defined 
on maximal domains must converge to a generalized critical point of /. 

In the paper we have restricted ourselves to function on R™, although in a number 
of key publications (221 [13 E] on the subject the emphasis is on infinite dimensional 
situations and applications to calculus of variations. Our choice has been mainly 
motivated by the desire to make the basic ideas and the techniques as clear as possible 
and not to obscure them by additional technicalities. We hope to consider infinite 
dimensional extensions of our approach and results elsewhere. Here we just want to 
emphasize that for many of them, this is an easy task at least for the case of a separable 
Hilbert (or even separable Asplund) space, provided the function / is coercive, in the 
sense that its sublevel sets are closed and bounded (hence weak compact). 

The outline of the manuscript is as follows. Section[2]is a self-contained treatment 
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of the basics of variational analysis that we will use. In this section, we emphasize 
the interplay between slopes and subdifferentials, and that the slope provides a very 
precise way of quantifying error bounds (Lemma 2.12 1. In Section [3] we define our 



main objects of interest — proximal descent curves — and analyze their properties 
and their relationship to curves of maximal slope and evolution equations. In Sec- 
tion |4j we strengthen the existance theory from the previous section by means of a 
simple viability Lemma (Lemma 4.1 ). In Section [5j we comment on some of the key 



assumptions needed to make our existance theory work and on when the two afore- 
mentioned notions of steepest descent coincide. This naturally leads to Section [6j 
where we consider additional properties of steepest descent curves for semi-algebraic 
functions. 

2. Preliminaries. In this section, we summarize some of the fundamental tools 
used in variational analysis and nonsmooth optimization. We refer the reader to the 
monographs of Borwein-Zhu [7], Clarke-Ledyaev-Stern-Wolenski [9] Mordukhovich 
[23] . and Rockafellar-Wets [26], and to the survey of Ioffe [17], for more details. Unless 
otherwise stated, we follow the terminology and notation of [T7] and [25] . 

2.1. Variational analysis. Consider the extended real line R := RU {— oo} U 

{+oo}. We say that an extended-real-valued function is proper if it is never {— oo} 
and is not always {+oo}. For a function /: R™ — > R, we define the domain of / to 
be 

dom/ := {x E R™ : f(x) < +oo}, 

and we define the epigraph of / to be 

epi/ := {(x,r) e R™ x R : r > f(x)}. 

Throughout this work, we will only use Euclidean norms. Hence for a point 
x G R n , the symbol ||a;|| will denote the standard Euclidean norm of x. A func- 
tion / : R" — > R is lower- semicontinuous (or Isc for short) at x if the inequality 
liminfa;-^ f{x) > f(x) holds. We will say that / is locally Lipschitz continuous at a 
point x G R" relative to a set Q, if for some k > the inequality 

\f(x) — f(y)\ < k\\x — y\\ holds for all x,y £ Q near x. 

The infimum of such constants k is the Lipschitz modulus of / at x relative to Q, and 
we will refer to it by lip /(x; Q) . If the set Q is not explicitly mentioned, then the 
reader should assume that Q is simply R". Henceforth, the symbol o(\\x — x\\) will 
denote a term with the property 

o(\\x — x\\) - . i 

— t — > 0, when x — > x with x f= x. 

\\x — x\\ 

The symbols clQ, convQ, coneQ, affQ, and parQ will denote the topological closure, 
the convex hull, the (non-convex) conical hull, the affine span, and the parallel sub- 
space of Q respectively. An open ball of radius e around a point x, will be denoted by 
B e (x), while the open unit ball will be denoted by B. A primary variational-analytic 
method for studying nonsmooth functions on R" is by means of subdifferentials. 

Definition 2.1 (Subdifferentials). Consider a function /: R" — s- R and a point 
x with f(x) finite. 
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1. The Frechet subdifferential of / at x, denoted df(x), consists of all vectors 
v € R™ satisfying 

f(x)>f{x) + (v,x-x)+o(\\x-x\\). 

2. The limiting subdifferential of / at x, denoted df(x), consists of all vectors 
v E R n for which there exist sequences x; G R™ and v.- L g df(xi) with 
(xi,f(xi),Vi) -> (x,f(x),v). 

3. The horizon subdifferential of / at x, denoted d°° f(x), consists of all vectors 
v G R™ for which there exists a sequence of real numbers \, and a 
sequence of points Xi <E R™, along with subgradients v,- L £ df{xi), satisfying 

(Xi,f(Xi),TiVi) (X,f(x),v). 

4. The Clarke subdifferential of / at x, denoted d c f(x), is obtained by the con- 
vcxification 

d c f(x) :=clco [df(x)+dP°f(x)]. 

We say that / is subdifferentiable at x, whenever df(x) is nonempty (equivalently when 
d c f(x) is nonempty). 

In particular, every locally Lipschitz continuous function is subdifferentiable. For 
x such that f(x) is not finite, we follow the convention that df(x) = df(x) — 
d°°f(x) = dj(x) = 0. 

The subdifferentials df(x), df(x), and d c f(x) generalize the classical notion of 
gradient. In particular, for C 1 -smooth functions / on R™, these three subdifferentials 
consist only of the gradient Vf(x) for each x g R". For convex /, these subdiffer- 
entials coincide with the convex subdifferential. The horizon subdifferential d°° f(x) 
plays an entirely different role; namely, it detects horizontal "normals" to the epi- 
graph. In particular, a lsc function / : R n — > R is locally Lipschitz continuous around 
x if and only if we have d°° f(x) = {0}. Moreover, if / is locally Lipschitz continuous 
at x, then we have equality 

hpf(x)= max ||u||. 

v£df(x) 

For a set Q C R ra , we define the indicator function of Q, denoted 5q, to be zero 
on Q and plus infinity elsewhere. The geometric counterparts of subdifferentials are 
normal cones. 

Definition 2.2 (Normal cones). Consider a set Q c R n . Then the Frechet, 
limiting, and Clarke normal cones to Q at any point x € R™ are defined by Nq(x) :— 
dS(x), Nq(x) :— d5(x~), and Nq(x) := d c S(x) respectively. 

A particularly nice situation occurs when all the normal cones coincide. 

Definition 2.3 (Clarke regularity of sets). A set Q c R™ is said to be Clarke 
regular at a point x, 6 Q if it is locally closed at x and every limiting normal vector 
to Q at x is a Frechet normal vector, that is the equation Nq(x) = Nq(x) holds. 

The functional version of Clarke regularity is as follows. 

Definition 2.4 (Subdifferential regularity). A function /: R™ — > R is called 
subdifferentially regular at x if f(x) is finite and epi / is Clarke regular at (x, f(x)) as 
a subset of R™ x R. 

In particular, if /: R" — > R is subdifferentially regular at a point x, € dom/, 
then equality df(x) = df(x) holds ((2H Corollary 8.11]). Shortly, we will need the 
following result describing normals to lower-level sets [2HI Proposition 10.3]. We 
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provide an independent proof for completeness and ease of reference in future work. 
The reader may safely skip it upon first reading. 

Proposition 2.5 (Normals to lower- level sets) . Consider a Isc function f : R" — > 
R and a point x £ R™ with ^ df(x). Then the inclusion 

%</(x)] (x) C (cone3/(x)) U d°°f(x) holds. 



Proof. Define the real number a :— f{x) and the sets C & := {(x, a) : a < a} and 

Qa := (epi /) Q £ g = {(x, a) : f(x) < a < a}. 

We first show the implication 

(i',0)£%(i,a) =► x* £ (<xmedf(x))\Jff»f(x). (2.1) 

Indeed, consider a vector (x*,0). Then Fuzzy calculus [TTJ Chapter 2.1] implies that 
there are sequences {x lk ,a lk ) £ epi/, (x* lk ,(3i k ) £ N epif (xi k , a lk ), (x 2k ,a 2k ) € £a 
and [x* 2k ^2k] € ^£ a (a;2fe,a2fc) satisfying 

(zifc,aifc) (x,a), (x 2 fe,Q!2fe) (x,a), + -> £*, Afe + /?2fc ->• 0. 

Observe x* 2k = and hence x* lk — > x*. Furthermore, by nature of epigraphs we have 



Pik < 0. If up to a subsequence we had ftxk — 0, then (2.1 ) would follow immediately. 
Consequently, we may suppose that the inequality fin- < is valid. Then we have 
|/?ifc| -1 Xi fe G df(xik)- Since the norms of x* lk are uniformly bounded and we have 
^ df{x), the sequence /3u- must be bounded. Consequently we may assume that 



f5\k converges to some /3 and (2.1 1 follows. 

Now consider a vector u* £ N[f <s ,i(u) for some u £ [f < a]. Consequently the 
inequality (u*,h) < o(\\h\\) holds, whenever h satisfi es f (u + h) < a. The latter 



in turn implies (u*,0) £ Nq & (u,o). Together with (2.1), taking limits of Frechet 



subgradients and applying equation (2.1 1 completes the proof. □ 

Remark 2.6. Theorem |2.5| and its proof easily extend to the case when / is 

defined on certain infinite dimensional spaces (e.g. Hilbert Spaces). 

We now record the very useful generalization of the classical Mean Value Theorem 

to an entirely nonsmooth setting. See for example (9J Theorem 2.4]. 

Theorem 2.7 (Lebourg's Mean Value Theorem). Consider a function f: R" — > 

R that is Lipschitz continuous on an open set containing the line segment (x, y). Then 

there exists a point u in (x,y) satisfying 

f(y)-f(x)£(d c f(u),y-x). 



For a set Q C R n and a point x £ R™, the distance of x from Q is 

d(x,Q) := inf ||x-y||, 

and the metric projection of x onto Q is 

Pq(x) :={y£Q:\\x-y\\=d(x,Q)}. 

A fundamental notion in variational analysis is that of slope. For more details about 
slope and its relevance to the theory of metric regularity, see |17j . 
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Definition 2.8 (Slope). Consider a function /: R™ — > R, and a point x £ R" 
with f(x) finite. The slope of / at x is 

|V/|(,):=li mS up^M-^. 



The limiting slope is 



|V/|(x) :=lirninf|V/|(x), 

x — fx 

J 

where the convergence x — ¥ x means (x, f(x)) — > (x, f(x)). 

For C 1 -smooth functions / on R n , the equation |V/|(x) = |V/|(x) = ||V/(x)|| 
holds. The following result, which follows from the proofs of [T71 Propositions 1 and 2, 
Chapter 3], establishes an elegant relationship between the slope and subdifferentials. 
We outline a proof for completeness. 

Proposition 2.9 (Slope and subdifferentials). Consider a Is c function f : R" — > 
R, and a point x £ R™ with f(x) finite. Then we have |V/|(x) < d(0,df(x)), and 
furthermore the equality 



|V/|(x) = d(0,a/(i)), holds. 

In particular, the two conditions |V/|(x) = and £ df(x) are equivalent. 

Proof. The inequality |V/|(x) < d(Q, df(x)) is immediate from the definition of 
the Frechet subdifferential. Now define m = |V/|(x). One may easily check that if 
m is infinite, then the subdifferential df{x) is empty, and therefore the result holds 
trivially. Consequently we may suppose that m is finite. 

Fix an arbitrary e > 0, and let a; be a point satisfying 

||x-x||<e, \f(x) - f{x)\ < e, and |V/|(ac) < m + e. 

Define the function g(u) := f(u) + (rn + e)\\u — x\\. Observe that for all u sufficiently 
close to x, we have g(u) > f{x). We deduce (see e.g. Exercise 10.10]) 

£ dg(x) C df{x) + (m + e)B. 

Hence we obtain the inequality m + e > c?(0, df(x)). Letting e tend to zero, we deduce 
to > d(0,df(x)). 

To see the reverse inequality, consider a vector v £ df(x) achieving c?(0,9/(x)). 
Then there exist sequences of points Xi and vectors i?j £ df{xi) with (a;,, f(xi), vi) — > 
(x, f(x),v). Observe that for each index i, we have > |V/|(a;i). Letting i tend 
to infinity, the result foll ows. □ 

Thus by Proposition 2.9 for any lsc function /: R" — » R and any point x £ R™, 
we have 



d(Q,df(x)) = |V/|(x) < |V/|(S) < d(0,df(x)). 

In particular, if / is subdiffercntially regular at x, then the slope and the limiting 
slope are one and the same, that is the equation |V/|(a;) = |V/|(x) holds. 

In our work, it will be useful to extend functions that are Lipschitz continuous 
on their domains to ones that that are Lipschitz continuous on the whole space. This 
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can be done in a standard way. However, we will need more precise properties of such 
an extension, which we record below. 

Lemma 2.10 (Inclusion of subdifferentials). Consider a Isc function f : R™ — > R 
that is Lipschitz continuous on its domain with modulus I. Choose a real number 
K > I and define the function 

g(x) := inf {f(y) + K\\x-y\\}. (2.2) 

2/GR' 1 

Then g is a K -Lipschitz continuous function, coinciding with f on dom /, and having 
the property that for any point x £ dom / the inclusion dg(x) C df(x) holds. Further- 
more, we have \Vg\(x) = |V/|(x) and \Vg\(x) = |V/|(x) for any point x £ dom/. 

Proof. It is well-known that g is a iT-Lipschitz continuous function on R" and 
that it coincides with / on dom/. See for example [TB] or [§1 Problem 11.6]. Define 
the function <\>: R™ x R™ — > R by <p(x,y) = f(y) + K\\x — y\\. Then g is simply the 
marginal function 

g(x) = min <f)(x,y). 

It is not difficult to check that the solution mapping x i— > argmin^ (f>(x, ■) is locally 
bounded around x, and is outer-semicontinuous (in fact, inner-semicontinuous as well). 
This verifies the assumptions of [20l Proposition 3.3]. Applying this proposition, we 
deduce that for any subgradient v £ dg(x), we have 

(v,0)G |J d<p(x, y) C {0} x df(x) + K{(w, —w) : w £ B}, 

y^argmin tf>(x,-) 

where the latter inclusion follows from the observation {x} — argmin agR „ <P{x, y). 
We deduce existence of a vector w £ B with v — Kw £ df(x). Hence we have 
established the inclusion dg(x) C df(x). This clearly implies |Vg|(x) > |V/|(x). 
On the other hand, we claim that the equation |V/|(al) = |Vg|(x) holds for any x in 
dom/. Indeed, the inequality |Vp|(x) > |V/|(x) is clear. On the other hand, consider 
a point x dom/ and a point y £ dom/ with g{x) = f{y) + K\\x — y\\. Then we 
have 

{g{x)-g{y))+ > (g{x) - g{x) + K\\y - x\\)+ > (g(x) - g(x))+ 
\\x-y\\ ||x-x|| + ||y-o;|| ll^-^ll 

where the last inequality follows since g is fsT-Lipschitz continuous. The equality 
|Vg|(i) = |V/|(x) follows. Finally, we deduce 

WiHx) < liminf {\Vg\(x):x£ dom/} = jV/Rx) < |W(S), 

x— >x 

and hence we have equality throughout, thereby completing the proof. □ 
We record below the celebrated Ekeland's variational principle. 
Theorem 2.11 (Ekeland's variational principle) . Consider a Isc function g : R™ — > 

R that is bounded from below. Suppose that for some e > and x £ R™ 7 we have 

g{x) < inf / + e. Then for any p > 0, there exists a point u satisfying g{u) < g{x), 

\\u — x\\ < p~ x t, and 

g(u) + p\\u - u\\ > g(u), for all u £ R™ \ {u}. 
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The following consequence of Ekeland's variational principle will play a crucial 
role in our work [171 Basic Lemma, Chapter 1] . We provide a proof for completeness. 

Lemma 2.12 (Error bound). Consider a Isc function f : R" — > R. Assume that 
for some point x € dom/ ; there are constants a < f(x) and r, K > so that the 
implication 

a < f(u) < f(x) and \\u — x\\ < K =>■ |V/|(u) > r, holds. 

If in addition the inequality f{x) — a < Kr is valid, then the lower-level set [f < a] 
is nonempty and we have the estimate d(x, [f < a}) < r~ l {f(x) — a). 

Proof. Define a lsc function g : R n —> R by setting g(u) := (f(u)—a) + , and choose 



a real number p < r satisfying f(x) — a < Kp. By Ekeland's principle (Theorem 2.11 ), 
there exists a point u satisfying 

g(u)<g(x), \\u-x\\ < p~ 1 g(x) < K 

and 

g(u) + p\\u — u\\ > g(u), for all u. 

Consequently we obtain the inequality |V<?|(u) < p. On the other hand, a simple 
computation shows that this can happen only provided g(u) = 0, for otherwise we 
would have |Vg|(u) = |V/|(u) > r. Hence u lies in the level set [/ < a], and we 
obtain the estimate d(x, [f < a]) < p~ 1 (f{x) — a). The result now follows by taking 
p arbitrarily close to (and still smaller than) r. □ 

We conclude this subsection with the following standard result of Linear Algebra. 

Lemma 2.13 (Result in Linear Algebra). Suppose that R™ can be written as a 
direct sum R™ — V © W , for some vector subspaces V and W . Then for any vector 
b G R'\ the equations 

P v (b) = (b + W)DV = argmin||2||, hold. 

zeb+w 

Proof. Observe b = Pv(b) + Pw{b), and consequently the inclusion 
Pv{b) E (b + W)nV holds. 
The reverse inclusion follows from the trivial computation 

z e (b + W) n V z-P v {b) evnw => z — P v (b). 

Now observe that first order optimality conditions imply that the unique mini- 
mizer z of the problem 

min ||z|| 2 , 
zeb+w 

is characterized by the inclusion z € (b + W) n V, and hence the result follows. □ 
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2.2. Slope and gradient descent. What is steepest descent in absence of 
differentiability? As was discussed in the introduction, there are two influential ap- 
proaches. One approach is immediate: replace the gradient in the differential equation 
x(t) = — V/(x(t)) by a subdifferential, thus obtaining a differential inclusion. In other 
words, we may consider absolutely continuous curves x : [0, T) —¥ R" satisfying the 
evolution equation 

x(t) £ -df(x(t)), for almost every t € [0,T). 

Of course, one may replace the limiting subdifferential in the inclusion above by a 
subdifferential of another kind, though we will not dwell on this issue. For more on 
the theory of differential inclusions, and this approach in particular, see the classical 
references [21 [5]. 

The second approach can be motivated by an elementary calculation showing that 
C 1 -smooth solutions of the dynamical system x(t) = —Vf(x(t)) can be characterized 
by the pair of equations 

||±(t)|| = ||V/Kt))|| and ^l£^l(t) = -(\\Vf(x(t))\\) 2 , for each t e [0,T). 

In light of this observation, the ensuing notion arises naturally. The definition we 
propose follows closely that of curves of maximal slope (discussed below), introduced 
and extensively studied in [T5] , 

Definition 2.14 (Curve of near-maximal slope). A curve x(t) defined on a 
segment [0, T) (finite or infinite) is a curve of near-maximal slope if we have 

(a) x(t) is absolutely continuous; 

(b) ||i(«)||=iV7l(z(*)) a.e. on [0,T); 

(c) (/ o x){t) — f{x{t)) is absolutely continuous, non-increasing, and such that 

^^(*) = -(WTMt))) 2 a - e - on 



If the limiting slope \Vf\(x(t)) in the definition above is replaced by the slope 
|V/|(x(t)), then the resulting curve is said to be a curve of maximal slope. Clearly 
whenever / is subdifferentially regular, the two notions coincide. Figure 2.1 illustrates 
a curve that is a steepest descent curve, in both senses, for a function / on R". 

3. Main results. In this section, we analyze when curves of near-maximal slope 
and solutions of evolution equations exist. In fact, we will construct a curve that 
(under reasonable conditions) is a steepest descent curve in both senses. To this end, 
we propose to study the following intuitive notion. 

Definition 3.1 (Proximal descent curves). Consider a function /: R™ — > R, 
a point x, and a real number rj > 0. Let = tq < n < . . . < tj = i) be a 
partition of [0, rj] into k equal parts. With this partition, we associate a piecewise 
linear curve Mfe(r), for r G [0,77], as follows. Set Ufc(0) = x, and inductively define 
Ukiji+i) to be any point belonging to the projection of Uk(ji) onto the lower level set 
[/ < fip) ~ r i+i]i provided that this set is nonempty. Then we will call any limit 
point x(t) of Ufe(r) in the uniform metric, as k tends to infinity, a proximal descent 
curve of / at x. 

So when do proximal descent curves exist and what are their properties? We will 
see that in answering this question, the following condition appears naturally. At the 
risk of sounding extravagant, we give this property a name. 
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Fig. 2.1. f(x, y) = maxjx + y, \x - y\} + x(x + 1) + y(y + 1) + 100 



Definition 3.2 (Level prox-stability) . A lsc function /: R™ — > R is level prox- 
stable at x if there exists a neighborhood U of x so that for all sufficiently small 77 > 
and for any x € U satisfying 



x G [/(x) - i] < f < 00] and y G P i7n [/< / ( £ )_^(a;) /(y) = /(x) - 77, 
holds. 

Roughly speaking a function / is level prox-stable, provided that it has locally 
closed level sets and in a local sense, projecting locally points in the domain of / 
onto lower level sets [/ < f(x) — 77] is the same as projecting onto the level sets 
[/ = fix) — rj\. In particular, one can easily check that functions continuous on lines 
— those functions / so that for any pair x, y € dom/, the restriction of / to the line 
segment joining x and y is finite and continuous — are necessarily level prox-stable. 
Hence all functions of the form / = g + <f>, where : R™ — > R is a continuous function 
and g: R™ — > R is a lsc, convex function, are level prox-stable [211 Theorem 10.2]. 

We will also need the following assumption, which is almost the same as requiring 
|V/| to be an "upper- gradient" in the sense of [2]. 

ASSUMPTION 3.3 (Upper estimate). For a function f : R" — > R and for any 
curve x : (0, rj) — > dom / the inequality 



< ||i(r)|||V/|(a;(r)) holds for a. e. re [0,7?), 



(t) 



provided that both x and fox are absolutely continuous. 

It particular, this assumption is valid for any subdifferentially regular function, 
as we will see later (Lemma 5.1 ). We now arrive at the main result of this section. 



Trajectories of subgradient dynamical systems 



11 



Theorem 3.4 (Existence and properties of proximal descent curves). 
Consider a Isc function /: R" — > R and a point x € R™, with f finite at x and 
^ df(x). Then for all sufficiently small r\ > 0, the following are true. 
Existence: There exists a proximal descent curve x: [0,77] — > R" of f at x. 
Regularity: Any proximal descent curve x: [0, 77] — > R™ of f at x is necessarily 

Lipschitz continuous and lies fully in the domain of f. 
Speed of descent: Any proximal descent-curve x: [0,n] — > R™ of f at x satisfies 

the following properties. 

(a) f(x(r)) < f(x) — t, for each r G [0, n] . 

(b) > WTMt)) for a.e. r e [0,7?]. 

// / is level prox-stable at x and is continuous on dom /, then the above 
estimates can be strengthened: 

(a) f(x(r)) = f{x) — t, for each r G [0, 77]. In particular f o x is Lipschitz. 

(b) lip /(^(t); dom/) > ^ WT\(x(t)) for a.e. r € [0,rj\. 
If in addition, Assumption\3.3\ holds, then we have 



\x(r)\\ 



|V/|(x(r)) for a.e. t £ [0, rj\ 



Differential inclusion: Assume that f is level prox-stable at x and is continuous on 
dom/. Then for any proximal descent curve x: [0,77] — >• R™ of f at x, the 
inclusion 

x(t) € — clconv (cone 9/(x(r))) U d°° f(x(r)) , holds for a.e. t £ [0,77]. 

If f is subdifferentiable along the curve x(t), then the inclusion 

x(t) € — clcone9 c /(x(T)), holds for a.e. t£ [0,77]. 

If f is Lipschitz continuous on its domain, then the closure is not needed in 
the inclusion above. 

Proof. Since zero is not a subgradient of / at x, we can find constants 77 > 0, r > 
and C > such that all conditions of Lemma 2.12 are satisfied for x, a = f(x) — 7/ 
and K — C . In particular, shrinking 77 we may enforce the inequality r\ < rC. Let 
= t < T\ < ... < Tfc = 77 be a partition of [0, 77] into k equal parts, and let 
A := T,+1 rj — be the partition size. Finally define the piecewise linear functions u^{t) 
as in the definition of proximal descent curves. For notational convenience, in the 
proof we will suppress the index k in Ufe(r). 

We now show that all the points u(tj) (for i = 0, . . . , k) are well-defined, and in 
terms of the quantities 

n := inf{|V/|(«) : f(x) - t 1+1 < f(u) < f(u(n)), \\u - u(n)\\ < AC}, 

we have 

IK-Ti+i) - ufa) || < r^fa+i - n), (3.1) 



and 



r i+ i 



>r, Hufa+i) - x|| <r~ l T i+ i. (3.2) 



To this end, suppose that equations ( |3.1[ ) and ( 3.2 ) are valid for indices i = 0, . . . ,j — 1 . 
Observe if we have /(ufa)) < f{x) — Tj+i, then equality ufa) = ufa+i) holds and 
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the inductive step is true trivially. Hence suppose otherwise. We claim that the 



conditions of Lemma 2.12 are satisfied with x = u(tj), a = f(x) — r,- + i, K = AC, and 
with rj in place of r. 

To this end, we deduce 

. f(x) — Tj+i < f(u) < f(u(Tj)) and d(u,u(rj)) < AC => |V/|(«) > r^. 
Observe 

rC 

f( u ( T j)) ~ ~ t j+i) = T i+i - t j < (jj+i - T j)— = ArC < Ar,C, 

which is the first of the desired relations. The second relation follows immediately 
from the definition of rj . 



Applying Lemma 2.12 we conclude that the lower-level set [/ < f(x) — tw+x] i s 
nonempty, and that the inequality ||u(tj+i) — u(tj)|| < rj (rj+i — Tj) holds. Conse- 
quently, we obtain 

\\u{t j+1 ) - x\\ < \\u(T j+1 )-u(Tj)\\ + \\u(Tj)-x\\ < r/^Tj+i -T j ) + r~ 1 T j <r~ l T j+1 . 

Finally we claim that the inequality rj+i > r holds. To see this, consider a point u 
satisfying f(x) - t, j+2 < f(u) < }{u(t j+1 )) and \\u - u(r i+ i)|| < AC. Taking ( [372 ) 
into account, along with the inequality < C /rj, we obtain 

||u - x\\ < \\u - u(T j+1 )\\ + \\u{T j+1 ) - x\\ < TJ+2 ~ TJ+1 C + T ^ ± = ^C < C 

Combining this with the obvious inequality f(x) > f{u) > f(x) — rj, we deduce 
> r, and consequently ry+i > r. This completes the induction. 

It follows immediately from (3.1 1 that for varying k, the mappings Ufc(r) are Lip- 
schitz continuous with a uniform constant r _1 . As all of these mappings coincide at 
t = 0, the well-known theorem of Arzela and Ascoli guarantees that a certain sub- 
sequence of Uk(r) converges uniformly on [0, 77] to some mapping x(t). Furthermore 
any uniform limit of Uj,(r) is clearly also Lipschitz continuous with the same constant 
r _1 . This establishes the existence part of the theorem. 

We now claim that the inequality 

f(x(r)) < f{x) - t, holds for each r <E [0, rj\. (3.3) 

Indeed, fix r € [0, 77] and observe that there exists a sequence Tk — > t with f(uk{Tk)) < 
f(x) — Tk- Since / is lsc, we deduce 

f(x(r)) < liminf f(u k ( Tk )) < f(x) - r, 

k— > 00 



thereby establishing (3.3). In particular, we conclude that the curve x(t) lies fully in 
the domain of /, thereby completing the proof of the regularity claim. 
Next we claim that almost everywhere on [0, 77] , we have 

||i(r)|| < . (3.4) 

" |V/|(z(t)) 



This is almost immediate from (3.1 ). Indeed fix r € (0, rf) at which x is differentiable. 

br large k 



We can refer to (3.1 1 to show that for large k, we have 

1 

Jfe? 



Trajectories of subgradient dynamical systems 



13 



for some i G {0, ...,k}. Observe that since u(t) are uniformly bounded, up to a 
subsequence, the curves u k (r) converge weakly to x(j) in L 2 [0,r]]. Since weak con- 
vergence does not increase pointwise the norm almost everywhere, letting k tend to 
infinity, we deduce the result. 

Now suppose that / is level prox-stable at x. We'll show that we have equality 
in (3.3 1. To see this, fix r € [0, rj\ and observe that for any curve u k obtained from 
the partition = t < T\ < . . . < r k = rj of the interval [0,rj], the equality Ufc(rj) = 
f(x) — Tj holds. 

Moreover, for any 5 > there exists a sufficiently large integer k and a point 
t' £ [0,7?], satisfying f{u k (T r )) = f(x) - t' and 

max{|r - r% \f(x(r)) - f(x(r'))\, \f(x(r')) f(u k (r'))\} < 5. 

We deduce 

\f( x (r))-(f(x)-r)\ < \f( x (r))-f(x(T'))\ + \f(x(T')-f(u k (r'))\ + 

+ \f(u k (r'))-(f(x)-r)\<36. 

Since 5 can be arbitrarily small, we conclude that the equation 

/(a;(r)) = f(x) — r, holds for each r G [0, rj\. (3.5) 

Now consider a real t € (0, r\) at which x is differentiable and a real 5 > 0. From the 
equation above, we deduce that for sufficiently small e, we have 

e = \f(x(r + e)) - /(z(r))| - \f(x(r) + ex(r) + o(e)) - f(x(r))\ 

< (lip/(a:(r);dam/) + <s) ||ei(r) + o(e)||. 
Letting e and ^ tend to 0, we conclude that the inequality 

' < ||i(r)||, holds for a.e. r G [0,77]. (3.6) 



lip/(x(r);dom/) 



Suppose now that Assumption |3.3| holds. Observe that if we have df(x(r)) — 0, then 
the equality pCTn = |V/|(x(r)) holds trivially. On the other hand, for a.e. r € [0,r?] 
for which the subdifferential df(x(r)) is nonempty, we have 



1 



d(/oa;) 



<||i(r)|||V/|(x(r))<l. 



Hence we have equality throughout, completing the proof of the speed of descent 
claim. 

Suppose that / is level prox-stable and continuous on dom /. Observe that in 
light of Proposition |2.5[ for any index k and any r € [ri,r,_(-i] (for i = 1, ...,k) 
we have iifc(r) G — (cone 9/ (u^ (r i+ 1 ) ) ) U d°° f(u k (T i+ i)). Furthermore, recall that 
restricting to a subsequence we may suppose that u k converges weakly to x(t) in 
L 2 [0,7i]. Mazur's Lemma then implies that a subsequence of convex combinations of 
the form J2n=k a n^n converges strongly to x as k tends to 00. Since convergence in 
L 2 [0,r]] implies almost everywhere pointwise convergence, we deduce that for almost 
every r£ [0, 77], we have 



N(k) 



n—k 



-> 0. 
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Therefore if the inclusion 

x(t) G -clconv (cone df(x(r))) U 8°° f{x(r)) 

did not hold, then we would deduce that there exists a subsequence of vectors u% (r) 
with limj-i.oo u„ ! (r) not lying in the set on the right-hand-side of the inclusion above. 
This immediately yields a contradiction. 

If / is subdifferentiable along the curve cc(r), then the claimed inclusion follows 
from the definition of the Clarke subdifferential. If / is Lipschitz continuous on its 
domain, then Lemma [2.5| implies that / is subdifferentiable on dom/. A standard 
argument using Caratheodory's theorem then implies that the closure operation is not 
needed in the inclusion above. This completes the proof of the differential inclusion 
claim and the pr oof of the theorem. □ 



In Theorem 3.4 in particular we saw that if /: R" — > R is locally Lipschitz 
continuous on dom / at x, then for any proximal descent curve x: [0,rj] — > R™ of / at 
x, with r\ > sufficiently small, there exist multipliers A(r) satisfying 

x(t) G -X(T)d c f(x(r)), for a.e. r G [0, rj\. 

To further our goals, we need a more precise estimate on A(r). The following condition 
comes to the fore. 

Assumption 3.5. For a function f: R™ — > R and for any absolutely continuous 
curve x: [0, rf) — ¥ dom/, the function 

y i ^ {x{r),y) is constant on d c f{x(r)). 

for almost all r G [0, rf) . 

Assumption |3.5| may look a bit strange at first sight. Nevertheless, in Section [5] 
we will see that the collection of functions satisfying this condition is very large, in 
particular including all Lipschitz continuous subdifferentially regular functions and 
all semi-algebraic functions. 

Theorem 3.6 (Proximal descent curves and differential inclusions). 
Consider a function f : R" —¥ R that is locally Lipschitz continuous at a point x 
relative to its domain, and that satisfies ^ df(x). Suppose in addition that f is level 
prox-stable at x and that Assumption ] 3.5\ holds. Then for any proximal descent curve 
x : [0, rj\ — > R" of f at x, with r\ > sufficiently small, we have 

T^fl^e~df{x{r)), x(0)=x, and 
\\x(t)\\ 2 



Mr)]]- 1 = d(0,df(x(r))) = d(0,fl c /(x(r))), 

for almost every r G [0, 77] . 

Proof. Without loss of generality, we may assume that / is Lipschitz continuous 
on the whole domain. Let I be its Lipschitz modulus. Let r G [0, 77] be such that x is 
diffcrcntiable at r and the function 

y 1 — ^ (x(t), y) is constant on d c f(x{r)). 

Consider the affine subspaces 

V T = pari9 c /(x(r)), 
W T = (par9 c /(x(r))) X , 
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and let b be an arbitrary element of d c f{x{r)). Then we have 

a,Sd c f(x(r)) = b+V r . 

Combining this with Theorem |3.4| we deduce that there exists a real A > satisfying 

-x{t) g Xd c f(x(r)) C Xb + V T . 

We claim now that the inclusion ±(r) G W T holds. To see this, observe that for any 
real A^ and for vectors Wj G d c f (x(t)j , for i = 1, . . . , k, we have 



k 
i=l 

Now using Lemma 



^Ai^Cr),^)-^)^)] =0. 



2.13 



we deduce that — jx(t) achieves the distance of the 
affine space, aff 9 c /(x(r)), to the origin. On the other hand, the inclusion — jx(t) G 
d c f{x(r)) holds, and hence — \x(t) actually achieves the distance of d c f(x(T)) to the 



origin. In particular, we deduce 

1' 



x(T)\\=d(0,d c f(x(T))). 



Now for a fixed K > L define the function 



g{x) := inf {f(y) + K\\x - y\\). (3.7) 

This function is if-Lipschitz continuous on all of R™ and it coincides with / on dom/. 
By Lebourg's Mean Value Theorem (Theorem 2.7), we have e = (y e ,x(T + e) — x(r)), 
for some subgradient y € d c g{z e ), where z e is a point lying on the line segment joining 
x(t + e) and x(t). Letting e tend to 0, we deduce the existence of a subgradient 
y € d c g(x(T)) C d c f(x(r)) (Lemma 2.10) satisfying 



y, ■ 



x(t) 



= d(0,d c f(x(r))), 



\\x(t)\\ V \\x(t)\\ 
where the second equality follows from the computation 



(3.8) 



y, 



x(t) 



y + par d c f (x{t)) , 



x(t) 



• t \ x{t) 
A x(t) 



1 



XlT 



\\x(r)\\/ V ^ ^^"-\\x(r)\\/ \X^-"\\x(r)\\/ X 
Finally combining this with Theorem |3.4| we obtain the chain of inequalities 
1 „ 1 1 



> l|i(r)|| 



> 



d(0,df(x(r))) ~ d(0,d c f(x( T ))) ~ d(0,df(x(r))Y 

Hence we have equality throughout. Observe now that the equation ^||x(t)|| — 
implies A 



(3.9) 



i(r)|| 2 . Combining this with (3.9), the stronger inclusion pCTi 



G 



—df(x(r)) easily follows. □ 

Theorem 3.7 (Natural reparametrization of proximal descent curves). 
Consider a function f : R" — > R that is locally Lipschitz continuous relative to its 
domain at a point x, with ^ df{x). Suppose in addition that f is level prox-stable 
at x. Then for any proximal descent curve x: [0,n] —> R™ of f at x, with rj > 
sufficiently small, there exists a Lipschitz continuous reparametrization t(r) sending 
the interval [0, rj] to [0, T] and satisfying the following properties. 
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(a) x(t) is Lipschitz continuous. 

(b) lip/(x(t);dom/) > \\±{t)\\ > |V7l(*(*)) o-e. on [0,T]. 

T C J (f ° x )(t) = f( x (t)) * s Lipschitz continuous, strictly decreasing, and satisfies 
d(f o z) 



(// (t)<-(|V/l(*(*))) > a.e. on[0,T]. 
frfj The inclusion 

x(t) £ —cone d c f(x(t)), holds a.e. on [0, T]. 
If furthermore Assumption \ 3.5\ is satisfied, then we have 

x{t) € —df{x(t)), x(0) — x, and 



\x(t)\=d(0,df(x(t))) = d(0,dJ(x(t))), 

for almost every t € [0, T], and moreover x(t) is a curve of near-maximal slope. 
Proof. Consider the function 



t(r) := \\x(s)rds. 



and define T :— t(rf). Theorem |3 . 4| implies that t(r) is Lipschitz continuous and ||i(s)|| 
is bounded away from zero. Consequently t(r) has a Lipschitz continuous inverse. Let 
r(t) be the inverse of t(r), and set x(t) :— x(r(t)) for t e [0, T]. A trivial application 
of the chain rule, along with Theorems |3.6| and |3.4| yield all of the claimed results. □ 



4. Viability and strengthened existance theory. Observe that the combi- 
nation of Theorems |3.4| and |3.7| shows that for any lsc, level prox-stable, function 
/ : R™ — > R that is locally Lipschitz continuous on its domain, and that satisfies the 
key Assumption |3.5| there exists a proximal descent curve that is both a curve of 
near-maximal slope and a solution to the corresponding evolution equation. By far, 
the most stringent of the conditions needed to establish this is level prox-stability. 
In this section, we completely eliminate this condition. The following viability result 
will be key. Roughly speaking, it states that given a function / that is Lipschitz 
continuous on its domain, we can always find a Lipschitz continuous function g on all 
of R", agreeing with / on the domain of /, and having the crucial property that it 
admits proximal descent curves lying entirely in the domain of /. 

Lemma 4.1 (Viability). Consider a lsc function f: R" — > R that is Lipschitz 
continuous on its domain with modulus I. Assume that there is a point x G dom/ and 
a real number r > with |V/|(a;) > r. Take K > max{Z, r} and define the function 

g(x) := inf {f(y) + K\\x-y\\}. (4.1) 
yeR n 

Then g is a K -Lipschitz continuous function, coinciding with f on dom /, and having 
the property that there exists a proximal descent curve of g at x lying entirely in 
dom/. 

Proof. It is well-known that g is a if-Lipschitz continuous function on R" and 
that it coincides with / on dom/. Moreover, since / is lsc and Lipschitz continuous 



on dom/, the infimum in (4.1 1 is attained for any x. Indeed, it is easy to see that 



since we have I < K, minimizing sequences are bounded. 
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In the notation of Definition 3.1 define e :— Tj+i — t,. For sufficiently small 77 > 



consider constructing curves Ufe(r). When building the sequence itfc(rj) for « € {0, fc}, 
there are choices to be made, since projections onto lower level sets are not necessarily 
single-valued. For notational convenience, we drop the index k in Uk and we define 
points Ui := Mfe(rj). We claim that for any fixed k, we may choose the iterates u(ri) 
so that there is a sequence of points j/j £ dom / satisfying 



g{ui) = f{Vi) + K\\ui 



2/i 



(4.2) 
(4.3) 



This is clearly true for i = 0. As the inductive hypothesis, suppose this is true for 
i = 0, . . . , j. We now consider two cases. 



Case (a): Suppose first 
have 



Uj\\ > jr- Then for any u with g(u) — g(ui) — e, we 



g{ui) - g{u) < K\\u 



Consequently we deduce d(ui, [g < g{ui) — e]) > -4. 
y3 ~ Uj we obtain 



On the other hand, if we take 



K 



vr 



g(u) < f(yj 



■KM- 



Vj\ 



K 



Vi 



K \\y 3 -Uj\ 



g(uj) 



e. 



In conjunction with the equality 4= = \\u— Uj\\, we deduce the inclusion u £ Pdom f(uj)- 
Therefore we may set u(rj+i) = u and j/j+i = j/j. Equations (4.2) and (4.3) then 
follow from the inductive hypothesis. 
Case (b): Suppose now that the inequality ||itj 



Vi\\ < 



holds. 



muius. In light of 

Lemma 2.10 decreasing r slightly we may assume |Vg|(a;) > r for each x near x 
. Therefore we can take any itj+i such that g{uj+\) = g(uj) — e and — Uj\\ < ^. 

Let be a point achieving the minimum in the expression for g(u.j + i). We then 
obtain 

f(Vj+i) + K\\u j+ i - y j+ i\\ < fiyj) + K\\uj -yj\\-e< /(%), 
In turn, this implies 

K \Wj+i - Vj+A\ < l hj+i - Vj\\ < Khj+i - U3+1W 

that is 

el 



ij+i 



(K-l)\\u J+1 -y j+ x\ 



< 



Equations (4.2) and (pL3| follow for j + 1. □ 



We finally come to the main result of this section. 

Theorem 4.2 (Existence of steepest descent curves) . Consider a function / : R™ - 
R that is locally Lipschitz continuous relative to its domain at a point x, with ^ 
df(x). Then there exists an absolutely continuous curve x: [0, T] — > R™ withx(0) — x, 
satisfying the following properties. 

(a) x(t) is Lipschitz continuous. 

(b) \\x{t)\\ >WTMt)) a.e. on [0,7]. 
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(c) (Jo x)(t) = f(x(t)) is Lipschitz continuous, strictly decreasing, and satisfies 

< -(WTMt))) 2 , a.e. on [0,T]. 

(d) The inclusion 

x(t) £ —cone d c f(x(t)), holds a.e. on [0, T]. 
If furthermore Assumption \3.5\ is satisfied, then we have 

x(t) € —df{x(t)), x(0) — x, and 



\x(t)\=d(Q,df(x(t))) = d(0,d c f(x(t))), 

for almost every t €E [0, T], and moreover x(t) is a curve of near-maximal slope. 

Proof. Without loss of generality, we may assume that / is Lipschitz continuous 
on the entire dom/. Lemma |4 . 1 1 yields a Lipschitz continuous function g: R" — > R 
and a proximal descent curve x: [0,i]] — > R" of g at x lying entirely in dom/. Then 
Theorems |3.7| and |2.10| readily imply all of the claims. □ 



5. Comments on Assumption [375] and curves of near- maximal slope. In 

this section, we further analyze Assumption |3.5| and the notion of a curve of near- 
maximal slope. Our first goal is to prove that locally Lipschitz continuous subdiffcr- 
entially regular functions satisfy Assumption |3.5| We begin with the following key 
result. 

Lemma 5.1 (Calculus). Consider a function f: R" — > R and an absolutely 
continuous path x : [0, T) — > R n . Then for t 6 (0, T) we have 

d ^l(t) = {Bf(x(t)),m, 

provided that both x and fox are differ entiable at t, and df{x{t)) ^ 0. Consequently, 
in this case the vector x(t) is orthogonal to par df(x(t)). 
Proof. Observe 

djfox) = Um f(x(t + e))-f(x(t)) 
dt e^o e 

> (v, x(t)), for any v g df(x(t). 

Similarly we have 

d (f° x ) (t) = Um f(x(t-e))-f(x(t)) 
dt eio -e 

< (v,x(t)), for any v G df(x(t)). 

Hence if we have df(x(t)) ^ 0, then the equation 
d(fox) 



di (t) = (df(x(t)),x(t)), holds, 
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as claimed. Observe that for any convex set Q, we have parQ = R(Q — Q). The fact 
that x(t) is orthogonal to par df(x(t)) is now immediate. □ 

Proposition 5.2 (Lipschitz continuous, subdifferentially regular functions). 
Consider a function f : R" — > R that is subdifferentiable, locally Lipschitz continuous 
relative to dom/, and subdifferentially regular. Then f satisfies Assumption \3.5\ 

Proof. Consider an absolutely continuous path x : [0, T) — > dom /. Then clearly 
both fox and x are differentiable almost everywhere on [0, T). The result now follows 
trivially from Lemma |5.1| □ 

The following proposition shows that for a subdifferentially regular function that 
is locally Lipschitz continuous on its domain, curves of maximal slope are exactly the 
solutions to the corresponding evolution equations. 

Proposition 5.3 (Equivalence for subdifferentially regular Lipschitz functions). 
Consider a subdifferentially regular function f : R™ — > R that is Lipschitz continuous 
relative to its domain. Then x: [0,T) — > dom/ is a curve of maximal slope if and 
only if it satisfies 

x(t) £ —df(x(i)), for almost every t € [0, T). 



Proof. Consider an absolutely continuous curve x: [0, T) — > dom/. Then clearly 
both x and fox are differentiable almost everywhere on [0, T). Suppose now that 
x satisfies the subdifferential inclusion for almost every t £ [0, T). Then according 



to Lemma 5.1 for such t we have — x(t) € df{x{t)) and — x(t) £ (paidf(x(t))) ± . 



Lemma 2.13 then implies = d(0,df(x(t))) = |V/|(x(t)). In turn, Lemma 5.1 

shows 

d ^ f ^(t) = -\\x(t)\\ 2 = -(Nf\(x(t))) 2 - 

Thus x(t) is a curve of maximal slope. 

Conversely suppose that x: [0,T) R" is a curve of maximal slope and let 



v(t) g df(x{t)) be a vector achieving d(0, df{x{t))). Then again by Lemma 5.1 for 
almost every t £ [0, T) we have 

- (|V/|(ar(t))) 2 = ^^(t) - (v(t),x(t)). (5.1) 

Consequently the inequality (|V/|(x(i))) 2 < \\v\\ \\x(t)\\ holds, with equality if and 
only if v(t) and x(t) are collinear. Observe that v(t) and x(t) are indeed collinear, 
since if it were otherwise we would have 

(\Vf\(x(t))r<\\v\\\\x(t)\\ = (\Vf\(x(t))) 2 , 
a contradiction. Using the equations 

\\v(t)\\ = \\x(t)\\ = \Vf\(x(t)), 



and (5.1), we deduce x(t) = —v(t) £ df(x{t)), as we needed to show. □ 

Remark 5.4. The exact analogues of Propositions |5.2| and |5.3| hold for lsc convex, 
and more generally, for lower-C 2 functions / : R" — > R. The proof is almost identical, 
except one also needs to use [51 Lemma 3.3]. 
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6. The semi-algebraic setting. In this section we consider to what extent 
the results obtained thus far can be strengthened when the function / in question 
is semi- algebraic. For an extensive discussion on semi- algebraic geometry, see the 
monographs of Basu-Pollack-Roy 0], Lou van den Dries [35], and Shiota [27] ■ F° r 
a quick survey, see the article of van den Dries-Miller [29] and the surveys of Coste 
[HI [10]. Unless otherwise stated, we follow the notation of [29] and [TT] . 

A semi-algebraic set S C R™ is a finite union of sets of the form 

{x G R™ : Px{x) = 0, . . .,P k (x) =0,Q l (x)<0,..., Qi(x) < 0}, 

where P\, . . . , P k and Qi, ■ ■ ■ ,Qi are polynomials in n variables. In other words, S 
is a union of finitely many sets, each defined by finitely many polynomial equalities 
and inequalities. A function / : R™ — > R is semi- algebraic if epi / C R" +1 is a semi- 
algebraic set. 

Our immediate goal is to show that Assumption |3.5| holds for all semi-algebraic 
functions. We first record the following simple lemma, which in fact has nothing to 
do with semi-algebraicity. 

Lemma 6.1. Consider a set M C R" and a path x: [0,rj) — > R™ that is differen- 
tiate almost everywhere on [0,77). Then for almost every t G [0,rj), the implication 

x(t) G M =>■ x{t) G T M {x(t)), holds. 



The following is a key property of semi-algebraic functions that we will use. 

Theorem 6.2 (Projection formula) . Consider a semi-algebraic function f : R" — > 
R. Then there exists a partition o/dom/ into finitely many C 2 '-manifolds {Mi} so 
that for any manifold Mi and any point x G Mi, the inclusion 

pard c f{x) C N Mi (x), 

holds. 

Proposition 6.3 (Semi-algebraic functions satisfy Assumption 3.5). Consider 
a semi-algebraic function f : R" — > R. Then f satisfies Assumption 



3.5 



Proof. Consider the partition of dom/ into finitely many C 2 -manifolds {M{\ 



guaranteed to exist by Theorem 6.2 and let x : [0, rf) — > dom / be any absolutely con 
tinuous path. Since there are only finitely many manifolds Mi, applying Lemma |6.1 
we deduce that for any index i and for almost very t G [0, 77), the implication 

x(t) G Mi x(t) G T Mi (x(t)), 



holds. On the other hand by Theorem |6.2[ for any point x lying in a stratum Mi we 
have d c f(x) c v + Nm(x), for some vector v E R™. The result follows. □ 

Next obtain a semi-algebraic counterpart of Proposition [573j where we completely 
eliminate the subdifferential regularity requirement. 

Proposition 6.4 (Equivalence for semi-algebraic Lipschitz functions). 
Consider a semi-algebraic function f : R" — > R that is Lipschitz relative to its domain. 
Then x: [0, T) — > dom / is a curve of near-maximal slope if and only if it satisfies 



x(t) G —df(x(t)), for almost every t G [0, T). 
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Proof. Consider the partition of dom/ into finitely many C 2 -manifolds {M,}, 
guaranteed to exist by Theorem |6.2| We first record some preliminary observations. 
Clearly both x and fox are differentiable at a.e. t E [0, T). Furthermore, in light of 



Lemma 6.1 for any index i and for a.e. t G [0,rj) the implication 

x(t) G Mi =>x(t)e T Mi (x(t)), holds. 

Now suppose that for such t, the point x(t) lies in a stratum Mi and let g : R™ — > R be 
a C^smooth function agreeing with / on a neighborhood of x(t) in Mj. Lipschitzness 
of / on its domain then easily implies 

d (f° x ) r t) = lim f(x(t + e))-f(x(t)) 
dt e^o e 

= liffl /fa(i(i + t )))-/(i(i)) 

eiO e 

= Hm .9(PAf,(xft + e)))-g(xft)) 

= j t goP M .ox[t) = (Vg(x{t)),x(t)) 
= (Vg(x(t)) + N M Mt)),i(t)) = -\\i(t)\\ 2 . 
Suppose now that x: [0, T) —> R" satisfies the subdifferential inclusion for a.e. 



t e [0, T). Then w e hav e -±(t) G df(x(t)) and G T A / s Hence Lemma |2T3 

implies ||i(i)|| = |V/|(x(t)) for a.e. t G [0,T). Using the computation above, we 
conclude that x(t) is curve of near-maximal slope. 

Conversely, suppose that x(t) is a curve of near-maximal slope. Then we have 

-(|V/|(x(t))) 2 = ^£^(t) = (Vg(x(t)) + N Mt (x(t)),m- 

Now let v(t) G df(x(t)) be a vector achieving d(0, df(x(t))). Then from the equation 
above we conclude (\V f\(x(t))) 2 < \\v(t)\\\\x(t)\\, with equality if and only if x(t) and 
v(t) are collinear. On the other hand we clearly have equality in this expression, and 
consequently we deduce —x(t) = v(t) G df(x(t)) for a.e. t G [0,T). □ 

We end this section by showing that for semi-algebraic functions, bounded curves 
of near-maximal slope necessarily have bounded length. None of the arguments we 
present are new; rather we include this discussion with the purpose of painting a more 
complete picture for the reader. We begin with the celebrated Kurdyka-Lojasiewicz 
inequality. 

Definition 6.5 (Kurdyka-Lojasiewicz inequality). A function /: R n — » R is 
said to satisfy the Kurdyka-Lojasiewicz inequality if for any bounded open set U C R" 
and any real r, there exists p > and a non-negative continuous function ip: [r, r + 
p) — > R, which is C 1 -smooth and strictly increasing on (t, t + p), and such that the 
inequality 

l|v(v°/)||(x) > l, 

holds for all x G U with r < f(x) < r + p. 

In particular, all semi-algebraic functions satisfy the Kurdyka-Lojasiewicz in- 
equality [19] . For an extensive study the Kurdyka-Lojasiewicz inequality and a de- 
scription of its historical significance, see for example [5]. The proof of the following 
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theorem is almost identical to the proof of [HI Theorem 7.1]; hence we only provide a 
sketch. In fact, the theorem remains valid if rather than assuming semi-algebraicity, 
we only assume that the Kurdyka-Lojasiewicz inequality is satisfied. 

Theorem 6.6 (Lengths of curves of near-maximal slope). 
Consider a Isc, semi-algebraic function f : R" — > R, and let U be a bounded subset 
of R™. Then there exists a number N > such that the length of any curve of 
near-maximal slope for f lying in U does not exceed N . 

Proof. Let x : [0, T) be a curve of near-maximal slope for / and let ip be any 
strictly increasing C^-smooth function on an interval containing the image of / ox. It 
is then easy to see then that, up to a reparametrization, x is a curve of near- maximal 
slope for the composite function ipof. In particular, we may assume that / is bounded 
on U, since otherwise we may for example replace / by ip o f where tp(t) = yj^jf ■ 

Define the function 

C(s) = inf{|V/|(a;) :xeU, f(x) = s}. 

Standard arguments show that C is semi-algebraic. Consequently, with an excep- 
tion of finitely many points, the domain of C is a union of finitely many open intervals 
(pn,f3i), with £ continuous and either strictly monotone or constant on each such 
interval. Define for each index i, the quantity 

Ci = inf{£(s) : s £ (a*, ft)}. 

We first claim that C is strictly positive on each interval («i,ft). This is clear for 
indices i with Ci > 0. On the other hand if we have Cj = 0, then by Sard's theorem 
[18] the function C is strictly positive on (a,, /3j) as well. 
Define Q and rji by 

C = inf{i : f(x(t)) = a t } and 77 = sup{i : f(x(t)) = ft}, 

and let 1% be the length of x(t) between Q and rji. 
Then we have 



k = r w±(t)\\dt = r \vf\( x (t))dt < - q) r ^n^fdt) 



\\x(t)\\dt= \ 
On the other hand, observe 

WTMt)fdt = f(x( m )) - f(x(Q)) = ft - a,. 

d 

Finally in the case c, > we have U > Ci(rji — Ci): which combined with the two 
equations above yields the bound 



If the equation Cj = holds, then by the Kurdyka-Lojasiewicz inequality we can 
find a continuous function Ci - [<Xi,cti + p) — > R, for some p > 0, where C is strictly 
positive and C 1 -smooth on (o^, a.j + p) and satisfying |V(£j o f)\(y) > 1 for any y e U 
with on < f(y) < 014 + p. Since Ci is strictly increasing on {pn,a.i + p), it is not 
difficult to check that we may extend Ci to a continuous function on [aj,ft] and so 
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that this extension is C 1 -smooth and strictly increasing on (ctj, /3j) with the inequality 
|V(& o f)\(y) > 1 being valid for any y <G U with a, < f(y) < fa. 

Then as we have seen before, up to a reparametrization, the curve x{t) for t € 
[Ci)»7i] is a curve of near maximal slope for the function ^o/. Then as above, we 
obtain the bound k < ^i(fa) ~ 

We conclude that the length of the curve x(t) is bounded by a constant that 
depends only on / and on U, thereby completing the proof. □ 

Corollary 6.7. Consider a Isc, semi- algebraic function f: R™ — > R. Then 
any curve of near-maximal slope for f that is bounded and has a maximal domain 
of definition converges to a generalized critical point of f (a point x satisfying e 
df(x)). 
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